Variational inference for Student-t MLP models
نویسندگان
چکیده
This paper presents a novel methodology to infer parameters of probabilistic models whose output noise is a Student-t distribution. The method is an extension of earlier work for models that are linear in parameters to nonlinear multi-layer perceptrons (MLPs). We used an EM algorithm combined with variational approximation, the evidence procedure, and an optimisation algorithm. The technique was tested on two regression applications. The rst one is a synthetic dataset and the second is gas forward contract prices data from the UK energy market. The results showed that forecasting accuracy is signi cantly improved by using Student-t noise models. Key words: Variational inference, Student-t noise, multilayer perceptrons, EM algorithm, forecast. 1. Introduction In forecasting models, we generally assume that the data is corrupted by noise: yt = f(xt) + "t, where "t is a zero-mean probability distribution. Normally, the noise is assumed to be Gaussian distribution either because of arguments derived from the central Corresponding authour: email address: [email protected], Fax: +44 121 204 3685, Tel: +44 121 257 7718 Preprint submitted to Elsevier July 30, 2010 limit theorem or just to simplify calculations. For example, the log likelihood of a Gaussian noise model is a quadratic function of the output variables. This leads to the fact that in the training process, we can easily estimate the maximum likelihood solution using optimisation algorithms. Software and frameworks for training machine learning models such as radial basis functions (RBF), MLP, and linear regression (LR) with Gaussian noise can be found in [1]. Conversely, other noise models are much less tractable. So why use the Student-t distribution? In our previous work [2, 3], we used models with Gaussian noise to forecast gas and electricity forward prices in the UK energy market. In these experiments, the kurtosis, which is a measure of how outlier-prone a distribution is, of the residuals (i.e. the di¤erent between target and output of forecasting model) is between 16 and 17: the kurtosis of the Gaussian distribution is 3. Furthermore, P ( 3 < r < + 3 ) 0:982, where and are the mean and standard derivation of the residual respectively. The equivalent probability for a Gaussian distribution is 0:997; therefore, the residual distribution has heavy tails. This means that the residual distributions are much more outlier-prone than the Normal distribution. The large number of outliers can make the training process unreliable and error bar estimates inaccurate, because Gaussians are sensitive to outliers. It is clear that this data is not modelled well by a Gaussian distribution as has often been noted for nancial data. As a consequence, a Student-t distribution can be considered as a good alternative to a Gaussian because it is a fat-tailed distribution and is more robust. Moreover, the Student-t distribution family contains the Normal distribution as special case. There are several previous studies of inference with Student-t models. Tipping and Lawrence proposed a framework for training an RBF model with xed basis functions [4]. This study is a fully Bayesian treatment based on a varia-
منابع مشابه
Variational Inference and Learning for Continuous-Time Nonlinear State-Space Models
Inference in continuous-time stochastic dynamical models is a challenging problem. To complement existing sampling-based methods [2], variational methods have recently been developed for this problem [1]. Our approach, which was first introduced in [3], solves the variational continuous-time inference problem by discretisation that essentially reduces it to a discrete-time problem previously co...
متن کاملVariational Bayes for Continuous-Time Nonlinear State-Space Models
We present an extension of the variational Bayesian nonlinear state-space model introduced by Valpola and Karhunen in 2002 [1] for continuous-time models. The model is based on using multilayer perceptron (MLP) networks to model the nonlinearities. Moving to continuous-time requires solving a stochastic differential equation (SDE) to evaluate the predictive distribution of the states, but other...
متن کاملGaussian process regression with Student-t likelihood
In the Gaussian process regression the observation model is commonly assumed to be Gaussian, which is convenient in computational perspective. However, the drawback is that the predictive accuracy of the model can be significantly compromised if the observations are contaminated by outliers. A robust observation model, such as the Student-t distribution, reduces the influence of outlying observ...
متن کاملVariational Inference for Robust Sequential Learning of Multilayered Perceptron Neural Network
We derive a new sequential learning algorithm for Multilayered Perceptron (MLP) neural network robust to outliers. Presence of outliers in data results in failure of the model especially if data processing is performed on-line or in real time. Extended Kalman filter robust to outliers (EKF-OR) is probabilistic generative model in which measurement noise covariance is modeled as stochastic proce...
متن کاملExplaining Variational Approximations
Variational approximations facilitate approximate inference for the parameters in complex statistical models and provide fast, deterministic alternatives to Monte Carlo methods. However, much of the contemporary literature on variational approximations is in Computer Science rather than Statistics, and uses terminology, notation, and examples from the former field. In this article we explain va...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Neurocomputing
دوره 73 شماره
صفحات -
تاریخ انتشار 2010